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Abstract 

Gene expression is largely regulated by DNA methylation, transcription factor (TF), and microRNA (miRNA) before, during, and after 
transcription, respectively. Although the evolutionary effects of TF/miRNA regulations have been widely studied, evolutionary analysis 
of simultaneously accounting for DNA methylation, TF, and miRNA regulations and whether promoter methylation and gene body 
(coding regions) methylation have different effects on the rate of gene evolution remain uninvestigated. Here, we compared human- 
macaque and human-mouse protein evolutionary rates against experimentally determined single base-resolution DNA methylation 
data, revealing that promoter methylation level is positively correlated with protein evolutionary rates but negatively correlated with 
TF/miRNA regulations, whereas the opposite was observed for gene body methylation level. Our results showed that the relative 
importance of these regulatory factors in determining the rate of mammalian protein evolution is as follows: Promoter methylation « 
miRNA regulation > gene body methylation > TF regulation, and further indicated that promoter methylation and miRNA regulation 
have a significant dependent effect on protein evolutionary rates. Although the mechanisms underlying cooperation between DNA 
methylation and TFs/miRNAs in gene regulation remain unclear, our study helps to not only illuminate the impact of these regulatory 
factors on mammalian protein evolution but also their intricate interaction within gene regulatory networks. 

Key words: promoter/gene body methylation, transcription factor, microRNA, protein evolutionary rate, comparative 
genomics. 



Introduction 

Various factors have been known to control gene expression 
and form a complex regulatory network. Regulation of gene 
expression is strongly associated with the maintenance of 
normal cells and a variety of biological functions. The most 
prominent gene regulators are DNA methylation, transcription 
factor (TF), and microRNA (miRNA), which regulate gene 
expression at the pretranscriptional, transcriptional, and 
posttranscriptional levels, respectively. DNA methylation is a 
heritable epigenetic marker that regulates gene expression 
without altering DNA sequence (Egger et al. 2004). This mod- 
ification is highly associated with many cellular processes, 
including transcription, genomic imprinting, suppression of 
transposons, X-chromosome inactivation, chromatin struc- 
ture, embryonic development, and carcinogenesis (Li et al. 
1993; Heard et al. 1997; Walsh et al. 1998; Reik et al. 
2001; Feinberg and Tycko 2004; Laurent et al. 2010). 
Several studies have indicated that promoter methylation 



and gene body methylation exhibit different correlation pat- 
terns with gene expression. Promoter methylation is generally 
associated with transcriptional suppression, whereas gene 
body methylation is associated with active transcription 
(Jones and Takai 2001; Hellman and Chess 2007; Ball et al. 
2009; Bogdanovic and Veenstra 2009; Lister et al. 2009; Feng 
et al. 201 0; Laurent et al. 201 0; Maunakea et al. 201 0; Xiang 
et al. 2010; Zemach et al. 2010; Defossez and Stancheva 
2011; Jones 2012); therefore, methylation in promoter and 
gene body regions may play distinct biological roles. 

As for the two trans-regulatory factors (TFs and miRNAs), 
TFs are proteins that bind to specific DNA sequences (so-called 
"TF binding sites" [TFBSs]) usually located within promoters, 
through which they facilitate or repress the transcription 
of their target genes (Elnitski et al. 2006; Vaquerizas et al. 
2009); on the other hand, miRNAs are small (-22 nt) 
noncoding RNAs that target mRNAs and regulate gene ex- 
pression through mRNA cleavage or translation repression 
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(Bartel 2004, 2009). It has been reported that miRNAs can 
form intricate feedback and feed-forward loops with TFs 
within the context of gene regulatory networks (Hornstein 
and Shomron 2006; Shalgi et al. 2007; Tsang et al. 2007; 
Su et al. 2010). Such miRNA-TF networks are important for 
the stability of gene regulation mechanisms (Shalgi et al. 
2007; Yu et al. 2008), and may play crucial roles in diverse 
biological processes (Marson et al. 2008; Dahan et al. 201 1). 
In contrast, interactions between these two frans-regulatory 
factors (i.e., TFs and miRNAs) and DNA methylation during 
gene regulation are relatively poorly understood. 

In terms of molecular evolution, DNA methylation has been 
reported to remarkably increase the rate of spontaneous 
C-to-T mutations at CpG dinucleotides (Ehrlich and Wang 
1981; Hwang and Green 2004; Mugal and Ellegren 2011), 
resulting in enhanced sequence divergence. On the other 
hand, it has been recently reported that hypermethylated 
genes are subject to stronger selective constraints than hypo- 
methylated genes (Hunt et al. 2010; Lyko et al. 2010; Park 
et al. 2011; Sarda et al. 2012; Takuno and Gaut 2012). We 
previously observed that the first exons of transcripts are more 
susceptible to mutagenic effects, whereas the internal and last 
exons are more affected by the regulatory effects of DNA 
methylation (Chuang et al. 2012). We further indicated that 
the extent of gene body methylation correlates highly with 
within-gene variations (e.g., the type of exonic sequences, 
relative genie position, and degeneracy of coding nucleotides) 
in evolutionary rates at both the exon and nucleotide levels 
(Chuang et al. 2012; Chuang and Chen 2014). On the other 
hand, the evolutionary effects of TFs and miRNAs have been 
carefully examined, revealing that genes targeted by more 
different TFs (number of different TFs designated as "A/ TF ") 
or miRNAs (number of different miRNAs designated as 
"A/ miR ") tend to evolve more slowly in diverse species 
(Cheng et al. 2009; Xia et al. 2009; Wang et al. 2010; 
Chen, Chuang et al. 2011; Chen et al. 2013). These results 
manifested that all these regulatory factors are important in- 
dicators of evolutionary rates, regardless of DNA methylation 
at the pretranscriptional level, TF regulation at the transcrip- 
tional level, or miRNA regulation at the posttranscriptional 
level. However, to the best of our knowledge, no systematic 
evolutionary analysis is currently available that simultaneously 
accounts for these three regulatory factors. The following 
questions still await exploration: whether promoter and 
gene body methylation differentially correlated with the evo- 
lutionary rates of their target genes, A/ TF , and A/ miR ; which of 
these factors has a greater effect on the rate of mammalian 
protein evolution; and whether DNA methylation and trans- 
regulations have an interactive influence on protein evolution- 
ary rates. 

In this study, we collected single base-resolution DNA 
methylation data and TF- and miRNA-binding data from 
human, and systematically examined the correlations between 
these regulatory factors (i.e., levels of promoter/gene body 



methylation, A/ TF , and A/ miR ) and the evolutionary rates of 
their target genes (nonsynonymous substitution rate [oy, 
synonymous substitution rate [d s l and c/ N /c/ s ratio). To control 
for other confounding factors that may affect the evolutionary 
rates of protein-coding genes, we also considered the follow- 
ing eight biological factors in our statistical analyses: 1) Protein 
connectivity (Lemos et al. 2005; Plotkin and Fraser 2007; Xia 
et al. 2009; Liao et al. 2010; Wang et al. 2010); 2) gene 
expression level (Liao et al. 2006, 2010; Larracuente et al. 
2008; Xia et al. 2009; Chen, Chuang et al. 201 1 ; Yang and 
Gaut 2011); 3) tissue-specific gene expression (Liao et al. 
2006, 2010; Larracuente et al. 2008; Park and Choi 2010; 
Yang and Gaut 2011); gene compactness in terms of 4) 
untranslated region (UTR) length (Liao et al. 2006; Cheng 
et al. 2009; Yang and Gaut 2011), 5) intron length (Marais 
et al. 2005; Liao et al. 2006, 201 0; Yang and Gaut 201 1 ), and 
6) intron number (Larracuente et al. 2008; Yang and Gaut 
201 1); and protein structure in terms of 7) solvent accessibility 
(Bloom et al. 2006; Lin et al. 2007; Zhou et al. 2008; Franzosa 
and Xia 2009) and 8) disorder content (Kim et al. 2008; Brown 
et al. 2010; Chen, Chuang et al. 201 1). In this way, we show 
that the levels of DNA methylation of both promoters and 
gene bodies are important indicators of protein evolutionary 
rates (c/ N and c/ N /c/ s ) when other confounding factors are con- 
trolled. Interestingly, protein evolutionary rates are positively 
correlated with the level of promoter methylation, but nega- 
tively correlated with the level of gene body methylation. 
Furthermore, promoter methylation is negatively correlated 
with A/jf and A/ miR , whereas gene body methylation is posi- 
tively correlated with the two trans-regulations. We also 
report that the level of promoter methylation and A/ miR have 
the greatest influence on protein evolutionary rates, and they 
have a dependent effect on the rate of mammalian protein 
evolution. This result supports the previously hypothesized po- 
tential reciprocal regulation between these two regulatory 
factors (Taguchi 2013a, 2013b). 

Materials and Methods 

Collection of Single Base-Resolution DNA Methylation, 
TFBS, and miRNA Target Data 

The human gene annotations, gene orthology assignments, 
and human-mouse/human-rhesus macaque evolutionary 
rates (c/ N , c/ s , and c/ N /c/ s ) were downloaded from the 
Ensembl database at http:/A/vww.ensembl.org/ (last accessed 
November 2013) (version 73). We considered only 1:1 
human-mouse and human-rhesus macaque orthologs to 
avoid the confounding factor of gene duplication; further- 
more, we considered only the longest isoforms of alternatively 
spliced genes. The promoter region of a gene was defined as 
the intergenic region from 8 kb upstream to 2 kb downstream 
of the Ensemble-annotated gene start position. Genes with 
genie regions (including exons and introns) <2 kb in length 
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were not considered. For accuracy, a gene whose promoter 
region overlaps with other gene(s) was not considered. The 
base-resolution DNA methylation data from 12 human cell 
types (including cultured cells, cells from healthy individuals 
[SI— S1 1 , table 1], and breast cancer cells [supplementary 
table S1, Supplementary material online]) were downloaded 
from NGSmethDB v2 (Geisen et al. 2014) at http://bioinfo2. 
ugr.es/NGSmethDB/ (last accessed October 2013). These data 
sets were generated with bisulfite or MethylC sequencing. 
CpG dinucleotides with single nucleotide variants were not 
considered in this study to avoid potential sequencing errors. 
To ensure the accuracy of the methylome data, only CpG 
dinucleotides covered by >5 bisulfite/MethylC reads were 
considered (such CpG dinucleotides were designated as 
"sampled CpGs"). A sampled CpG site is regarded as meth- 
ylated (designated as "mCG") if >80% of the mapped reads 
support the methylation status at the CpG site (Meissner et al. 
2008; Laurent et al. 2010). We only considered the genes 
whose promoter and gene body regions both contained 
>10 sampled CpGs to ensure that the examined regions 
contained sufficient information for estimating the methyla- 
tion level. Of note, the gene body region of a gene represents 
the coding sequence, with the exception of regions overlap- 
ping with the promoter of the gene (e.g., 2 kb downstream of 
the Ensemble-annotated gene start position). 

TFBS data were obtained by downloading chromatin im- 
munoprecipitation (ChIP) data (which includes 162 human TF 
ChlP-seq data sets) from the ENCODE project (Bernstein et al. 
201 2). A given TF was considered to regulate a gene if at least 
one of its ChlP-seq peaks was located within the promoter 
region of the gene. Human miRNA target prediction data 



(which include 1,267 miRNAs) were downloaded from 
TargetScan release 6.2 (April 2013) (Ruby et al. 2007; 
Friedman et al. 2009). For accuracy, we only considered 
the human miRNA families in which the corresponding 
target sites were determined to be conserved across mam- 
mals using TargetScan (Friedman et al. 2009). The human, 
rhesus macaque, and mouse genes examined in this study 
(together with the related information) are available at 
http://idv.sinica.edu.tw/trees/DM_TF_miRNA/DM_TF_miRNA. 
html (last accessed June 25, 2014). 

Measurement of CpG Dinucleotide Depletion (CpG Q /E) 
and the Methylation Level in Promoter and Gene Body 
Regions 

Since a low ratio of observed-to-expected CpG dinucleotides 
(CpGq/e) represents a large fraction of mutated CpG dinucle- 
otides, CpG 0 /E is a measurement of CpG dinucleotide deple- 
tion (Bird and Taggart 1980; Park et al. 201 1, 2012). CpG 0/E 
was defined as follows: 



number of CpGs x length of the examined region 
number of Cs x number of Gs 

where P CpGl P Cl and P G represent the frequency of CpG 
dinucleotides, C nucleotides, and G nucleotides, respectively, 
in the examined promoter/gene body regions. The methyla- 
tion level of an examined region was measured by calculating 
the density of mCG per 1 00 CpG dinucleotides (mCG density). 
mCG density was defined as 



Table 1 



Experimentally Determined Single Base-Resolution DNA Methylation Data Used in this Study 


Sample 


Description (Ref.) 


No. of Genes 


Average CpG 


Average CpG 






(Sampled #CG>10) 


Coverage 3 


Coverage 3 








(Promoter) (%) 


(Gene Body) (%) 


S1 


Peripheral blood B lymphocytes 


10,627 


76.04 


86.69 




(Hodges et al. 2011) 








S2 


Peripheral blood hematopoietic stem/progenitor cells 


10,451 


78.03 


89.10 




(CD133+CD34+CD38-Lin-) (Hodges et al. 2011) 








S3 


Newborn foreskin fibroblasts (Laurent et al. 2010) 


10,388 


89.34 


93.56 


S4 


H1 embryonic stem cells (Lister et al. 2009) 


10,751 


68.54 


89.41 


S5 


Breast cells from adult female (Hon et al. 2012) 


10,916 


96.78 


98.75 


S6 


Peripheral blood hematopoietic stem/progenitor cells 


10,722 


80.04 


89.41 




(CD34+CD38-Lin-) (Hodges et al. 2011) 








S7 


Fetal lung fibroblasts (Lister et al. 2009) 


10,848 


78.13 


93.62 


S8 


Peripheral blood granulocytic neutrophils 


10,081 


71.27 


82.91 




(Hodges et al. 2011) 








S9 


Prefrontal cortex (Zeng et a I. 2012) 


10,556 


80.58 


90.38 


S10 


H9 embryonic stem cells (Laurent et al. 2010) 


10,522 


91.81 


95.31 


S11 


Fibroblasts derived from H9 embryonic stem cells 


10,436 


90.01 


93.65 




(Laurent et al. 2010) 









a Coverage of CpG dinucleotides for each promoter/gene body (protein-coding) region = (number of sampled CpG dinucleotides)/(number of 
sampled CpG dinucleotides + number of nonsampled CpG dinucleotides). 
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mCG density 



number of mCGs x 100 
number of all CpGs sampled ' 



Data Retrieval of Protein Connectivity, Gene Expression, 
Gene Compactness, and Protein Structural Features 

Protein connectivity (PPI) data were downloaded from STRING 
9.0 (Szklarczyk et al. 2011). Gene expression data were 
obtained by downloading normalized expression data sets 
for 78 nonpathogenic human tissues from BioGPS (Wu 
et al. 2009). If more than one probe set referred to the 
same gene, the signals from the relevant probe sets were 
averaged. The gene expression level was defined as the aver- 
age signal intensity across the 78 examined tissues. The tissue 
specificity (r) of a gene was defined as follows: 



x = 



y^n / log 5(Q \ 



n- 1 



where n, S(i), and 5 max denote the number of examined tissues 
(i.e., n = 7S), the signal intensity, and the highest signal across 
all examined tissues, respectively (Yanai et al. 2005). A large 
r value indicates high tissue specificity for a gene. To minimize 
potential noise that might be caused by low signal intensities, 
values of 5{i) less than 100 were set to 100 (Liao and Zhang 
2006; Liao et al. 2006; Chen et al. 2010). Gene compactness 
was described based on intron number and average length of 
UTR/intron for a given gene. Protein structure was described 
based on solvent accessibility and disorder content. The sol- 
vent accessibility of a protein was calculated from the maxi- 
mum number of exposed residues interacting with solvent 
molecules over the protein's length; the exposed residues 
were determined using ACCPro release 4.1 with the default 
parameters (Cheng et al. 2005). Only proteins <8,000 amino 
acids in length were considered due to limitations of ACCPro 
(Cheng et al. 2005). The disorder content of a protein was 
defined as the percentage of intrinsically disordered regions, 
estimated by dividing the number of disordered residues by 
protein length. The disordered residues were predicted using 
DISOPRED2 version 2.4 with default parameters (Akgul et al. 
2004). To minimize the standard error when calculating 
disorder content, we only considered proteins >100 amino 
acids in length (Chen, Chuang et al. 201 1). 

Calculation of the Relative Contribution to Variability 
Explained 

The relative contribution to variability explained (RCVE) is used 
to assess the relative importance of each tested factor, which 
is defined as follows: 



RCVE : 



/?2 _ p2 
n full n reduced 



where R 2 ull and Educed denote the R 2 value (share of variability 
explained) of the full model (including all of the tested factors) 
and that of the reduced model (excluding the factor of inter- 
est), respectively. R 2 is the square of the coefficient of corre- 
lation between the observations and their predicted values in a 
multiple linear regression model, which is a measure of the 
proportion of total variation of observed outcomes explained 
by the model. Accordingly, the RCVE coefficient represents 
the relative contribution of a factor in the context of all 
other factors included in the full model, which ranges from 
0 to 1 with a higher value indicating a more important con- 
tribution of the factor of interest to the regression model 
(Kvikstad et al. 2007). 

Results and Discussion 

Promoter and Gene Body Methylation Exhibit Opposite 
Correlation Patterns with the Protein Evolutionary Rates 
of Their Target Genes 

To investigate the levels of DNA methylation in human 
promoter and gene body regions, we retrieved single base- 
resolution DNA methylation data from 1 1 human cell lines 
(see Materials and Methods; table 1). It should be noted 
that the gene body regions examined were coding sequences, 
with the exception of regions overlapping with the promoter 
regions. Table 1 shows the number of genes examined in each 
data set. The level of DNA methylation was measured by the 
density of mCG (see Materials and Methods). For each gene 
considered, both the examined promoter and gene body 
regions contained at least 10 sampled CpGs (see Materials 
and Methods). On average, over 68% of CpGs were sampled 
for both examined promoters and gene bodies (table 1), indi- 
cating that sufficient CpG dinucleotides were sampled in the 
examined regions. 

To probe the impact of promoter and gene body methyl- 
ation on evolution, we first examined the correlation between 
CpGo/E (i-e., the ratio of the observed-to-expected number of 
CpG dinucleotides, see Materials and Methods) and pro- 
moter/gene body mCG density. Low CpG Q /E ratios (indicating 
a high level of C-to-T mutations) have been reported to be 
caused primarily by DNA methylation (Bird and Taggart 1980; 
Park et al. 2011, 2012); therefore, an inverse correlation 
is observed between CpG 0 /E and DNA methylation (Bird 
and Taggart 1980; Zemach et al. 2010; Park et al. 2011). 
As such, the coefficient of correlation between these two 
measurements can be used to estimate the proportion of 
methylated CpGs that have undergone mutation (Chuang 
et al. 2012). As shown in figure 1, both promoter and 
gene body mCG densities were generally negatively correlated 
with CpGo/E- Meanwhile, the absolute values of the 
Spearman's rank correlation coefficient (p) between CpG Q /E 
and mCG densities were observed to be generally higher 
in promoters than in gene bodies (fig. 1), suggesting that 
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■ Promoter Gene body 
S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 



-0.1 
-0.2 
-0.3 
-0.4 
-0.5 
-0.6 
-0.7 
-0.8 



Fig. 1. — Spearman's rank correlation coefficient (p) between CpG 0 /E 
and mCG density for the promoter and gene body regions in the 1 1 
analyzed methylomes. *P<0.05 and ***p< 0.001. 
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mN TF 
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Promoter mCG density 



Gene body mCG density 







mN TF 











Promoter mCG density 



Gene body mCG density 



Fig. 2. — Spearman's rank correlation coefficient (p) between average 
promoter/gene body mCG densities for the 1 1 analyzed samples (table 1) 
and the 2 regulatory factors (A/ TF or A/ miR ) (A) before and (£) after control- 
ling for 10 other confounding factors (see text). The analyses were based 
on 5,418 genes containing all confounding factors examined. 
***P< 0.001. 



promoter regions are more susceptible to the mutagenic 
effect of DNA methylation than gene bodies. This result 
also echoes our earlier report that DNA methylation tends to 
more easily induce C-to-T mutations in the first exons than in 
internal/last exons (Chuang et al. 2012), and implies dif- 
ferential evolutionary effects of promoter and gene body 
methylation. 



We next examined whether promoter and gene body 
methylation levels are differentially correlated with the evolu- 
tionary rates (d N , d Sl and d^ds) of their target genes. Here, we 
examined the level of methylation using the average mCG 
density across the 1 1 methylomes in table 1 . As shown in 
table 2, the c/ N , d s , and d^ds values of target genes are all 
positively correlated with average mCG density in promoters, 
but negatively correlated with that in gene bodies, for both 
human-macaque and human-mouse comparisons (all P 
values <1CT 7 ). This result reveals that DNA methylation 
levels of promoters and gene bodies have opposite effects 
on the evolutionary rates of their target genes. To control 
for other confounding factors that may affect the evolutionary 
rates of protein-coding genes, we used partial correlation 
analyses (Kim and Yi 2007) to simultaneously control for 
gene body (or promoter) mCG density, trans-regulation (A/ TF 
and A/ miR ), and eight other confounding factors as stated in 
Introduction section: Protein connectivity, gene expression 
(level and tissue specificity), gene compactness (UTR length, 
intron length, and intron number), and protein structure (sol- 
vent accessibility and disorder content). We found that protein 
evolutionary rates (c/ N and c/ N /c/ s values) remain positively cor- 
related with promoter methylation level and negatively corre- 
lated with gene body methylation level (table 2). The results 
thus indicate that both promoter and gene body methylation 
levels are important indicators of protein evolutionary rates, 
even though the evolutionary effect of gene body methylation 
is more influenced by the aforementioned confounding fac- 
tors than that of promoter methylation. Of interest, the partial 
correlation between gene body methylation level and d s dis- 
appears or even becomes positive after the control (table 2), 
suggesting that highly methylated regions are subject to 
strong selective pressure at the protein level despite the en- 
hanced mutation rate (resulting in elevated d s ) in gene bodies. 
Moreover, the absolute correlation coefficient value for pro- 
moter mCG density is greater than that for gene body mCG 
density after the control (table 2). This indicates that c/ N and 
cVc/s are more strongly correlated with promoter mCG den- 
sity than with gene body mCG density. We thus suggest that 
the rate of mammalian protein evolution may be influenced 
more by promoter methylation than by gene body methyla- 
tion. Furthermore, we also examined the correlation between 
protein evolutionary rates and gene promoter/gene body 
mCG density in turn by controlling for each of the abovemen- 
tioned confounding factors. The trends that protein evolution- 
ary rates are positively correlated with promoter methylation 
level and negatively correlated with gene body methylation 
level are generally maintained (supplementary table S2, 
Supplementary Material online). The absolute correlation co- 
efficient values also indicate that the correlations between 
protein evolutionary rates and promoter/gene body methyla- 
tion levels are greatly influenced by A/ TF and A/ miR , although all 
the correlations remain significant (supplementary table S2, 
Supplementary Material online). 
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Table 2 

Spearman's Rank Correlation Coefficient (p) between Promoter (or Gene Body) mCG Density and the Evolutionary Rates (c/ N , d Sl and 
c/n/c/s) for Human-Macaque and Human-Mouse Comparisons Before and After Controlling for Gene Body (or Promoter) mCG 
Density and Ten Other Confounding Factors 







Before Control 






After Control 




c/ N 


d s 


c/n/c/s 


c/n 


ds 


c/n/c/s 


Human-macaque 3 














Promoter methylation 


0.2052*** 


0.0798*** 


0.1928*** 


0.1434*** 


0.0009 


0.1601*** 


Gene body methylation 


-0.0975*** 


-0.0872*** 


-0.0803*** 


-0.0607*** 


-0.0121 


-0.0718*** 


Human-mouse b 














Promoter methylation 


0.268*** 


0.1453*** 


0.2304*** 


0.1907*** 


0.0329* 


0.1827*** 


Gene body methylation 


-0.096*** 


-0.0768*** 


-0.0793*** 


-0.0289* 


0.025 


-0.0452*** 



Note. — The ten confounding factors are A/ TF , N miRl protein connectivity, expression level, tissue specificity, UTR length, intron length, intron number, 
solvent accessibility, and disorder content. 

a The analysis was based on 5,128 human genes and their macaque orthologs. 
b The analysis was based on 5,357 human genes and their mouse orthologs. 
*P<0.05 and ***P< 0.001. 



Promoter and Gene Body Methylation Levels Exhibit 
Different Correlation Patterns with TF/miRNA Regulations 

We proceeded to investigate the relationships between DNA 
methylation in promoter/gene body regions and the two 
frans-regulations (A/ TF and A/ miR ); Spearman's correlation 
analyses revealed that A/ TF and A/ miR both exhibit a negative 
correlation with promoter mCG density, but a positive correla- 
tion with gene body mCG density (all P values < 1 CP 8 ; fig. 2A). 
Because A/ TF and A/ miR may also be correlated with other con- 
founding factors as stated in the last section, we in turn per- 
formed partial correlation between one of the promoter/gene 
body mCG densities and one of the two frans-regulations (A/ TF 
or A/ miR ) by simultaneously controlling for two other factors 
and the eight abovementioned confounding factors (fig. IB). 
After the control, we found that A/ TF and A/ miR still exhibit a 
negative correlation with promoter mCG density, and A/ TF 
remains positively correlated with gene body mCG density, 
whereas the partial correlation between A/ miR and gene 
body mCG density disappears (fig. 2B). This suggests that 
the correlation between A/ miR and gene body methylation is 
greatly influenced by other confounding factors. In addition, 
the absolute correlation coefficient values also suggest that 
both promoter and gene body mCG densities are more 
strongly correlated with A/ TF than with A/ miR . 

Regarding promoter methylation, it was previously 
reported that the interactions between TFBSs and their corre- 
sponding TFs are sensitive to DNA methylation (Chen, Feng 
et al. 201 1), and TFBSs tend to be hypomethylated to prevent 
destabilization of TF-DNA interactions (Siegfried et al. 1999; 
Lister et al. 2009; Straussman et al. 2009). In addition, it has 
been suggested that promoter methylation and miRNA regu- 
lation may complement each other's function, and thus the 
promoters of genes regulated by more miRNAs tend to have a 
lower level of DNA methylation (Su et al 201 1). These inter- 
connections between biological features may lead to such a 



negative correlation between promoter mCG density and 
theses two trans-regulations. 

Regarding gene body methylation, body-methylated genes 
have been suggested to be functional important and repre- 
sent housekeeping functions (Sarda et al. 2012; Takuno and 
Gaut 2012). Densely methylated genes tend to evolve more 
slowly than sparsely methylated genes (Hunt et al. 201 0; Lyko 
et al. 201 0; Park et al. 201 1 ; Sarda et al. 201 2; Takuno and 
Gaut 201 2). These observations indicate a negative correlation 
between gene body methylation and protein evolutionary 
rates (c/ N and c/ N /c/ s ). As for TF/miRNA regulations, genes 
targeted by more TFs or miRNAs tend to be under stronger 
selective constraints (Cheng et al. 2009; Xia et al. 2009; Wang 
et al. 201 0; Chen, Chuang et al. 201 1 ). Such a trend is broadly 
maintained throughout metazoa (Chen et al. 2013). Thus, 
both A/ TF and A/ miR are negatively correlated with protein evo- 
lutionary rates. In addition, a positive correlation was observed 
between A/ TF and A/ miR (Cui et al. 2007; Chen et al. 2013). 
These findings thus reveal that these three factors (gene body 
mCG density, A/ TF , and A/ miR ) are all negatively correlated with 
c/ N and c/ N /c/ s , implying a positive correlation between gene 
body mCG density and frans-regulations. 

Taken together, the trend that promoter and gene body 
methylation levels have different correlation patterns with the 
two trans-regulations (A/ TF and A/ miR ) also reflects the different 
effects of promoter and gene body methylation on the protein 
evolutionary rates of target genes (table 2). This also implies 
the existence of a complicated interaction between these two 
frans-regulations and DNA methylation in different regions. 

Promoter Methylation and miRNA Regulation Are Major 
Factors of Protein Evolutionary Rates 

The above analyses indicated that both promoter and gene 
body methylation levels are important indicators of protein 
evolutionary rates (c/ N and c/ N /d s ). We next set out to 



Genome Biol. Evol. 6(6): 1 530-1 541 . doi:10.1093/gbe/evu124 Advance Access publication June 12, 2014 



1535 



Chuang and Chiang 



GBE 



determine which biological factor(s) is/are the major factor(s) 
of protein evolutionary rates. We previously reported that of 
the ten biological factors associated with evolutionary rates of 
proteins (A/ TF , A/ miR , protein connectivity, expression level, 
tissue specificity, UTR length, intron length, intron number, 
solvent accessibility, and disorder content), A/ miR tends to be 
the most important factor of c/ N and d N /d s in mammals (Chen 
et al. 2013). We therefore estimated the relative importance 
of these ten factors and promoter/gene body methylation 
levels in determining the rate of mammalian protein evolution. 
Using partial correlation analysis, we examined the correla- 
tions between protein evolutionary rates and each of these 
12 factors in turn by simultaneously controlling for the other 
1 1 factors. As shown in figure 3, we found that promoter 
mCG density and A/ miR had the greatest influence on c/ N and 
c/ N /d s for both human-macaque and human-mouse compar- 
isons. Gene body mCG density has a relatively lower effect on 
c/ N and d N /d s than promoter mCG density and A/ miR , but a 
relatively greater effect than A/ TF (fig. 3). We also performed 
a linear repression model, RCVE (see Materials and Methods), 
to measure the relative effect of these 1 2 factors in determin- 
ing c/ N and c/n/c/s, and showed the similar trends (supplemen- 
tary fig. S1, Supplementary Material online). 

As discussed above, promoter mCG density and A/ miR have 
opposite effects on c/ N and c/ N /c/ s . The finding that they are 
both the major factors of protein evolutionary rates thus raises 
the question of whether these two rate determinants have an 
interaction impact (especially mutual impact) on protein evo- 
lutionary rates. This will be discussed in the next section. 

Promoter Methylation and miRNA Regulation Exhibit 
Dependent Effects on Protein Evolutionary Rates in 
Mammals 

Recent studies reported that promoter methylation and 
miRNA may coregulate their target genes. Changes in pro- 
moter methylation may affect miRNA targeting, suggesting a 
mutual correlation between miRNA-mediated regulation of 
target genes and miRNA-targeting-specific promoter methyl- 
ation in brain (Taguchi 2013a, 2013b). Promoter methylation 
of miRNA-targeted genes has also been suggested to be 
highly correlated with miRNA seed region features (Taguchi 
2013b). For example, the promoters of genes targeted by 
miR-548 tend to be significantly hypomethylated (Taguchi 
2013b). Figure 2 also reveals a significantly negative correla- 
tion between mCG density and A/ miR . Therefore, we are curi- 
ous about whether there is a dependent effect of promoter 
mCG density and A/ miR on protein evolutionary rates. To this 
end, we conducted a stepwise multiple regression analysis 
including promoter and gene body mCG densities and the 
aforementioned ten biological factors to examine the interac- 
tion of the effects of these factors on c/ N /c/ s . The stepwise 
model selection showed that the coefficient of promoter 
mCG density-A/ miR interaction terms significantly deviate 



from zero in both human-macaque and human-mouse com- 
parisons (supplementary table S3, Supplementary Material 
online), suggesting the dependence between promoter meth- 
ylation and miRNA regulations in determining d N /d s . 

To further probe the interaction impact of promoter 
methylation and miRNA regulations on mammalian protein 
evolution, we divided the human protein-coding genes 
into five groups of similar size, according to the magnitudes 
of c/ N or c/ N /c/ s , and calculated the median values of promoter 
mCG density and A/ miR for each group of genes. As shown in 
figure 4, the lower the promoter mCG density, the lower the 
c/ N and c/ N /c/ s values for both human-macaque and human- 
mouse comparisons; on the other hand, an opposite correla- 
tion was observed between A/ miR and protein evolutionary 
rates. In other words, genes with hypomethylated promoters 
and strong miRNA regulation are subject to stringent selec- 
tive constraints; in contrast, genes with hypermethylated 
promoters and weak miRNA regulation are subject to relaxed 
selective constraints. This result indicates that these two 
factors have a mutual impact on protein evolutionary rates, 
also reflecting the above observations that c/ N and d^/ds values 
are negatively correlated with A/ miR but positively correlated 
with promoter mCG density (table 2). 

Potential Caveats 

In this study, the single base-resolution methylome data were 
derived from cultured cells or cells from healthy individuals 
(table 1). It is possible that methylome data from cancerous 
cell lines may have introduced bias in the trends we observed. 
To examine this possibility, we extracted single base-resolution 
methylome data from breast cancer cells (supplementary table 
S1, Supplementary Material online) (Hon et al. 2012), and 
performed the same analyses as described above. We found 
the similar trends as described above: promoter methylation is 
positively correlated with protein evolutionary rates but neg- 
atively correlated with A/ TF and A/ miR , whereas the opposite 
was observed for gene body methylation (supplementary 
tables S4 and S5 and fig. S2, Supplementary Material 
online); the relative importance of these regulatory factors in 
determining the rate of mammalian protein evolution is as 
follows: promoter methylation « miRNA regulation > gene 
body methylation > TF regulation (supplementary fig. S3, 
Supplementary Material online); and promoter methylation 
and miRNA regulation have a significant dependent effect 
on protein evolutionary rates (supplementary fig. S4 and 
table S6, Supplementary Material online). In addition, because 
rodents have a faster molecular clock than primates (Li 1997; 
Nekrutenko et al. 2003), it is possible that the observed trends 
may be biased toward the compared species with different 
molecular clocks. In this study, we performed all statistical 
analyses on both comparisons between primate and rodent 
(i.e., human vs. mouse) and those between primates (i.e., 
human vs. rhesus macaque), and showed that they had the 
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Fig. 3. — Absolute values of Spearman's rank correlation coefficients (p) between protein evolutionary rates (c/ N and d^d s ) and one of the indicated 
factors (promoter/gene body mCG density [mCG p and mCG g ], A/ TF , A/ miR , protein connectivity [PPI], expression level [ExpLvl], tissue specificity [r], UTR length 
[UTR_L], intron length [ln_L], intron number [InNum], solvent accessibility [SolAcc], or disorder content [DisCont]), while simultaneously controlling for the 
other 1 1 factors, for both human-macaque (5,128 genes) and human-mouse (5,357 genes) orthologs. 



same tendencies. We thus suggest that the observed trends 
are not biased toward different molecular clocks. 



Conclusions 

In this study, we examined the impacts of promoter/gene 
body methylation, TF regulation, and miRNA regulation 
(which act before, during, and after transcription, respectively) 



on the evolutionary rates of the target protein-coding genes. 
We made several findings. First, promoter and gene body 
methylation levels exhibit opposite correlation patterns with 
protein evolutionary rates (c/ N and c/ N /c/ s ): the former exhibits a 
positive correlation with c/ N and d^d s , whereas the latter 
exhibits an inverse correlation with these two evolutionary 
rates (table 2). On the basis of partial correlation analysis, 
we emphasize that these correlations are maintained after 
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Fig. 4. — The median values of promoter mCG density (mCG p ) and A/ miR in five groups of human protein-coding genes of similar size (divided according 
to the magnitudes of c/ N and d^d Sl from low to high) for both human-macaque (5,128 genes) and human-mouse (5,357 genes) comparisons. 



excluding the effect of other confounding factors (table 2), 
indicating that both promoter and gene body methylation 
levels are important indicators of protein evolutionary rates 
of their target genes. We also demonstrated that protein evo- 
lutionary rates are more strongly correlated with promoter 



methylation level than with gene body methylation level. 
Second, promoter and gene body methylation levels also 
exhibit different correlation patterns with the two trans- 
regulations (A/ TF and A/ miR ); the former is negatively correlated 
with A/ TF and A/ miR , whereas the latter is positively correlated 
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with these trans-regulations (fig. 2). Because A/ TF and A/ miR 
have been previously reported to be positively correlated 
with each other but inversely correlated with protein evolu- 
tionary rates (Chen et al. 2013), the correlations between 
promoter/gene body mCG density, A/ TF , A/ miR , and protein 
evolutionary rates can be summarized in figure 5. Third, we 
established that the relative importance of these regulatory 
factors in determining the protein evolutionary rates is as 
follows: promoter mCG density « A/ miR >gene body mCG 
density > A/ TF . We further determined that, among the 12 
biological factors considered, promoter methylation and 
miRNA regulation are generally the major factors in determin- 
ing d N and c/ N /c/ s . Finally, we demonstrated that these two 
factors have a dependent effect on protein evolutionary 
rates, and they have a mutual impact on protein evolutionary 
rates. In summary, our results indicate the complicated effects 
of natural selection on protein evolution, and the intricate 
relationships between regulatory systems acting before, 
during, and after transcription. This study thus increases our 
understanding of DNA methylation, TF, and miRNA regula- 
tions in evolutionary biology. 

Supplementary Material 

Supplementary figures S1-S4 and tables S1-S6 are available 
at Genome Biology and Evolution online (http:/A/vww.gbe. 
oxfordjournals.org/). 
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